Data Representation and Efficient Communication in APIs
Let's learn about some data representation considerations when developing an API.
We'll cover the following
Introduction#
Clients and servers communicate by exchanging data using a request-response model. An essential aspect of this communication is the data format(s) used for exchanging information. This significant factor sets the difference between an effective and an inept API. One aspect of API design is the choice of a data format that both the client and server can understand conveniently.
The choice of data format is only the first step to a standardized information flow between the sending and receiving parties. Rapidly changing industry demands make it inevitable to change or update the chosen data formats over time. Making changes to the server-side may also require updating the client-side. It’s relatively easy to reflect changes for the browser-based client interfaces where a simple refresh may be enough to push the client to a newer version. But for the clients that use installed software—such as mobile applications—to interact with the service, we often use rolling software updates, where the more recent version is gradually installed on the client-side. Therefore, data representation and data versioning are important concerns when designing an API.
This lesson discusses why data representation/formats matter and what considerations should be made when choosing suitable data format(s).
Representation considerations#
From a general perspective, the information is communicated in the simplest form so that the recipient can comprehend it. Regarding API design, the goal is not as straightforward, but the choice of data representation is undoubtedly one of the main factors. Here are some points that must be considered when choosing a data format:
Low latency: The user-perceived latency is one of the most common concerns when choosing a data format. API designers often prefer using data structures that facilitate fast network transfers. It may be part of an API SLA) to keep response time within certain limits. For example, to design APIs for time-critical applications such as live video conferencing, real-time surveillance systems, and so on, we may be unable to afford network delays.
Fast processing: Human-readable formats are desirable for scenarios where frequent debugging is required, especially when third-party applications might be using our API. However, they are often not as machine-friendly as an appropriate binary format. The time required to process information is also important when choosing data representation, especially when dealing with Big Data and devices with low processing power (for example, IoT devices designed for small payloads to be more cost-efficient).
Multiple-format support: The choice of data format is also influenced by the following questions when dealing with public APIs utilized by other applications:
How many formats will our API support?
What schema will we use to encode and decode these formats?
How will we preserve the data while it’s converted from one format to another?
Many languages support encoding and decoding schemas for data conversion but can be inefficient as a result of processing time and the size of the transformed data.
Restricted data format: Sometimes, the choice of data formats is subject to business goals. In such scenarios, we may have optimal data formats to choose from in terms of API performance. Still, depending on the company's business model and expertise, we must stick to specific data formats to reduce costs and satisfy business logic. For example, it’s extremely difficult for a large organization (such as Google) to switch from one data format to another when they have several different services (Maps, Search, YouTube, and so on), which are interconnected and frequently exchange data with others. In addition, changing the data format can also affect how the data is stored in the database, causing unnecessary transformation and processing power.
Custom data format: Application developers are free to invent their custom data encoding formats if they’re strongly needed. However, this kind of practice might come at the cost that their data can only be understood by their applications and might not interface with third parties.
Flexibility: APIs change over time, and newer versions may include changes in the data formats. When rolling software updates, there may be situations where we have to roll back changes due to unexpected bugs or security reasons. It can also take hours to fully roll out changes. During this period, at least two versions of the data must coexist amicably for the service in order to prevent any inconvenience. Therefore, it’s necessary to have some flexibility when changing data formats already in use. Data formats can show the following different types of flexibility:
Forward compatible: A robust way of adding new attributes/properties to the request-response data so that existing code can read data written by a new codebase.
Backward compatible: The ability of a data format to withstand changes in already existing attributes/properties to the request-response data so that new code can read and understand data written by an older version of the code.
Fully compatible: A data format that is backward and forward compatible with the changes made to the request-response data.
Point to Ponder
Question
How bad would things get if we pick an inappropriate data representation format?
Efficiently communicating the right data is one of the core functionalities performed by an API. Selecting an inappropriate data exchange format can cause serious damage to the API. Primarily converting from one format to another consumes computational resources and adds latency for the client. Additionally, correctness issues can arise if one of the formats change, and our conversion code needs to be doing the conversion right.
For example, if a consumer of an API operates on one data format (X) and the API sends data in a different data format (Y), the conversion performed to access the data may result in data corruption and inaccuracies under different operating conditions, making the API unusable.
Therefore, a wrong choice of data format can be very detrimental to the working of the API and may even result in the API becoming deprecated.
The best data format for a design problem is a tradeoff between the factors mentioned above and the features provided by a specific format. In general, a format is considered to be a good choice if it supports the following features:
Human-readable: Easy to read and debug by developers
Low latency: Fast to transmit over the wire
Standardized: Follows a well-defined pattern in the industry
Machine friendly: Needs less time to be processed by machines
Interoperable: Easy to serialize and deserialize data into different formats
Flexible: Easy to introduce and manage changes over time
We can divide the different data formats available into two broad categories:
Textual data formats
Binary data formats
We’ll see how they meet the criteria given above in the coming lessons.
Quiz#
Let’s test our understanding of the data formats considerations by matching the following columns:
Low latency can be achieved by…
…requires less processing time and can be automatically read by computing devices.
Data in machine-friendly format…
…easy to add changes and manage updates when using first-party applications
The term interoperability refers to…
…utilize compact data structure to speed up network transmission
The advantages of a custom data format are…
…the ability of different systems to easily store and retrieve data during transmission.
WebSockets
Textual Data Formats